{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The EvoHDP may be very slow if the amount of data is high. The main reason is about the lack of efficiency in the list computations. Let's see a few basic example to distinguish the main differences bewtween both."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1D analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'list'>\n",
      "<class 'numpy.ndarray'>\n"
     ]
    }
   ],
   "source": [
    "liste = [0]*1000\n",
    "arr = np.zeros((1000))\n",
    "print(type(liste))\n",
    "print(type(arr))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fill a list and a np array "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to fill a list : 0.00040912628173828125\n",
      "Time to fill an array : 0.0006341934204101562\n"
     ]
    }
   ],
   "source": [
    "time_start=time.time()\n",
    "for i in range(len(liste)):\n",
    "    liste[i]=i\n",
    "print(\"Time to fill a list : {}\".format(time.time()-time_start))\n",
    "\n",
    "time_start=time.time()\n",
    "for i in range(len(arr)):\n",
    "    arr[i]=i\n",
    "print(\"Time to fill an array : {}\".format(time.time()-time_start))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To assign values, a list is relatively faster"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apply a sum to a list and np array with \"sum\" and \"np.sum\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to fill a list : 0.05921506881713867\n",
      "Time to fill an array : 0.010154008865356445\n"
     ]
    }
   ],
   "source": [
    "time_start=time.time()\n",
    "for i in range(len(liste)):\n",
    "    liste[i]=sum(liste)\n",
    "print(\"Time to apply a sum for a list : {}\".format(time.time()-time_start))\n",
    "\n",
    "time_start=time.time()\n",
    "for i in range(len(arr)):\n",
    "    arr[i]=np.sum(arr)\n",
    "print(\"Time to apply a sum for an array : {}\".format(time.time()-time_start))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2D Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'list'>\n",
      "<class 'numpy.ndarray'>\n"
     ]
    }
   ],
   "source": [
    "liste = [[0]*10]*10\n",
    "arr = np.zeros((10,10))\n",
    "print(type(liste))\n",
    "print(type(arr))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to fill a list : 0.00012683868408203125\n",
      "Time to fill an array : 0.00016999244689941406\n"
     ]
    }
   ],
   "source": [
    "time_start=time.time()\n",
    "for i in range(len(liste)):\n",
    "    for j in range(len(liste[i])):\n",
    "        liste[i][j]=i+j\n",
    "print(\"Time to fill a list : {}\".format(time.time()-time_start))\n",
    "\n",
    "time_start=time.time()\n",
    "for i in range(len(arr)):\n",
    "    for j in range(len(arr[i])):\n",
    "        arr[i,j]=i+j\n",
    "print(\"Time to fill an array : {}\".format(time.time()-time_start))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to apply a sum for a list : 0.00035881996154785156\n",
      "Time to apply a sum for an array : 0.001500844955444336\n"
     ]
    }
   ],
   "source": [
    "time_start=time.time()\n",
    "for i in range(len(liste)):\n",
    "    for j in range(len(liste[i])):\n",
    "        liste[i][j]=sum(liste[i])\n",
    "print(\"Time to apply a sum for a list : {}\".format(time.time()-time_start))\n",
    "\n",
    "time_start=time.time()\n",
    "for i in range(len(arr)):\n",
    "    for j in range(len(arr[i])):\n",
    "        arr[i,j]=np.sum(arr[i,:])\n",
    "print(\"Time to apply a sum for an array : {}\".format(time.time()-time_start))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 3D Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'list'>\n",
      "<class 'numpy.ndarray'>\n"
     ]
    }
   ],
   "source": [
    "liste = [[[0]*100]*100]*10\n",
    "arr = np.zeros((100,100,10))\n",
    "print(type(liste))\n",
    "print(type(arr))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to fill a list : 0.035807132720947266\n",
      "Time to fill an array : 0.05226325988769531\n"
     ]
    }
   ],
   "source": [
    "time_start=time.time()\n",
    "for i in range(len(liste)):\n",
    "    for j in range(len(liste[i])):\n",
    "        for k in range(len(liste[i][j])):\n",
    "            liste[i][j][k]=i+j\n",
    "print(\"Time to fill a list : {}\".format(time.time()-time_start))\n",
    "\n",
    "time_start=time.time()\n",
    "for i in range(len(arr)):\n",
    "    for j in range(len(arr[i])):\n",
    "        for k in range(len(arr[i,j])):\n",
    "            arr[i,j,k]=i+j\n",
    "print(\"Time to fill an array : {}\".format(time.time()-time_start))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to apply a sum for a list : 21.207592964172363\n",
      "Time to apply a sum for an array : 0.9774158000946045\n"
     ]
    }
   ],
   "source": [
    "time_start=time.time()\n",
    "for i in range(len(liste)):\n",
    "    for j in range(len(liste[i])):\n",
    "        for k in range(len(liste[i][j])):\n",
    "            liste[i][j][k]=sum(liste[i][j])\n",
    "print(\"Time to apply a sum for a list : {}\".format(time.time()-time_start))\n",
    "\n",
    "time_start=time.time()\n",
    "for i in range(len(arr)):\n",
    "    for j in range(len(arr[i])):\n",
    "        for k in range(len(arr[i,j])):\n",
    "            arr[i,j,k]=np.sum(arr[i,j,:])\n",
    "print(\"Time to apply a sum for an array : {}\".format(time.time()-time_start))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### NP Array or List"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I recomment to use numpy array for computations (sum, addition, multiplication) if you have fixed size dimensions. If you just want to append values and save data, use a list. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Link with EvoHDP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The last example is important for our algorithm because it explains the duration of the algorithm EvoHDP\n",
    "In the EvoHDP, we need some list because the number of data in corpus j at time t is different. \n",
    "Collapsed Gibbs Sampling used the same kind of loop for W words, K topics and this for each document. This three dimensional sampling makes the algorithm difficult to converge"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
